↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Tyck till om SwePub Sök här!

Träfflista för sökning "db:Swepub ;pers:(Jantsch Axel);pers:(Lu Zhonghai);hsvcat:2"

Sökning: db:Swepub > Jantsch Axel > Lu Zhonghai > Teknik

Resultat 1-10 av 98

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Naeem, Abdul, et al. (författare) Architecture Support and Comparison of Three Memory Consistency Models in NoC based Syst 2012 Ingår i: Proceedings of 15th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD 2012). - : IEEE Computer Society. - 9780769547985 ; , s. 304-311 Konferensbidrag (refereegranskat)abstract We propose a novel hardware support for three relaxed memory models, Release Consistency (RC), Partial Store Ordering (PSO) and Total Store Ordering (TSO) in Network-on-Chip (NoC) based distributed shared memory multicore systems. The RC model is realized by using a Transaction Counter and an Address Stack based approach while the PSO and TSO models are realized by using a Write Transaction Counter and a Write Address Stack based approach. In the experiments, we use a configurable platform based on a 2D mesh NoC using deflection routing policy. The results show that under synthetic workloads, the average execution time for the RC, PSO and TSO models in 8x8 network (64 cores) is reduced by 35.8%, 22.7% and 16.5% respectively, over the Sequential Consistency (SC) model. The average speedup for the RC, PSO and TSO models in the 8x8 network under different application workloads is increased by 34.3%, 10.6% and 8.9%, respectively, over the SC model. The area cost for the TSO, PSO and RC models is increased by less than 2% over the SC model at the interface to the processor.
2.	Anagnostopoulos, Iraklis, et al. (författare) Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations 2011 Ingår i: IEEE Embedded Systems Letters. - 1943-0663. ; 3:2, s. 66-69 Tidskriftsartikel (refereegranskat)abstract Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamicbehavior, leading to unexpected memory footprint variations unknown at design time.Dynamic memory management (DMM) is a promising solution for such types of dynamicsystems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memorynodes of the platform. In this work, we address the problem of providing customizedmicrocoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application-and platform-level. Results show that customizedmicrocoded DMM can serve approximately 7× more allocation requests compared to puredistributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language.
3.	Candaele, Bernard, et al. (författare) Mapping Optimisation for Scalable multi-core ARchiTecture : The MOSART approach 2010 Ingår i: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010. - 9780769540764 ; , s. 518-523 Konferensbidrag (refereegranskat)abstract The project will address two main challenges of prevailing architectures: 1) The global Interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption; 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures Including middleware services and a run-time data manager for NoC based communication infrastructure; 2) Developing tool support for parallelizing and mapping applications on the multi-core target platform and customizing the processing cores for the application.
4.	Candaele, Bernard, et al. (författare) The MOSART Mapping Optimization for multi-core Architectures 2011 Ingår i: VLSI 2010 Annual Symposium. - Dordrecht : Springer Publishing Company. ; , s. 181-195 Konferensbidrag (refereegranskat)abstract MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.
5.	Chen, Xiaowen, et al. (författare) Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips 2010 Ingår i: 3rd IEEE International Conference on Computer and Electrical Engineering (ICCEE). Konferensbidrag (refereegranskat)abstract Barrier synchronization is commonly and widelyused to synchronize the execution of parallel processor coreson multi-core Network-on-Chips (NoCs). Since its globalnature may cause heavy serialization resulting in largeperformance penalty, barrier synchronization should becarefully designed to have low latency communication and tominimize overall completion time. Therefore, in the paper, wepropose a fast barrier synchronization mechanism, targetingMulti-core NoCs. The fast barrier synchronization mechanismincludes a dedicated hardware module, named Fast BarrierSynchronizer (FBS), integrated with each processor node. Itoffers a set of barrier counters and can concurrently processsynchronization requests issued by the local node and remotenodes via the on-chip network. The salient feature of our fastbarrier synchronization mechanism is that, once the barriercondition is reached, the “barrier release” acknowledgement isrouted to all processor nodes in a broadcast way in order tosave chip area by avoiding storing source node informationand to minimize completion time by avoiding serialization ofbarrier releasing. Synthesis results suggest that the FBS canrun over 1 GHz in SMIC® 130nm technology with small areaoverhead. We implemented a FBS-enhanced multi-core NoCarchitecture on our FPGA platform using the Xilinx® Virtex 5as the FPGA chip. FPGA utilization and simulation resultsshow that our fast barrier synchronization demonstrates botharea and performance advantages over the barriersynchronization counterpart with unicast barrier releasing.
6.	Chen, Xiaowen, et al. (författare) Cooperative communication based barrier synchronization in on-chip mesh architectures 2011 Ingår i: IEICE Electronics Express. - : Institute of Electronics, Information and Communications Engineers (IEICE). - 1349-2543. ; 8:22, s. 1856-1862 Tidskriftsartikel (refereegranskat)abstract We propose cooperative communication as a means to enable efficient and scalable barrier synchronization on mesh-based many-core architectures. Our approach is different from but orthogonal to conventional algorithm-based optimizations. It relies on collaborating routers to provide efficient gather and multicast communication. In conjunction with a master-slave algorithm, it exploits the mesh regularity to achieve efficiency. The gather and multicast functions have been implemented in our router. Synthesis results suggest marginal area overhead. With synthetic and benchmark experiments, we show that our approach significantly reduces synchronization completion time and increases speedup.
7.	Chen, Xiaowen, et al. (författare) Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs 2014 Ingår i: IEICE Electronics Express. - : Institute of Electronics, Information and Communications Engineers (IEICE). - 1349-2543. ; 11:18, s. 20140542- Tidskriftsartikel (refereegranskat)abstract On many-core Network-on-Chips (NoCs), communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. Different from conventional algorithm-based approaches, the paper addresses the barrier synchronization problem from the angle of optimizing its communication performance and proposes cooperative communication as a means to achieve efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. With the cooperative communication, routers collaborate with one another to accomplish a fast barrier synchronization task. The cooperative communication is implemented in our router at low cost. Through comparative experiments, our approach evidently exhibits high efficiency and good scalability.
8.	Chen, Xiaowen, et al. (författare) Handling Shared Variable Synchronization in Multi-core Network-on-Chips with Distributed Memory 2010 Ingår i: Proceedings. - 9781424466832 ; , s. 467-472 Konferensbidrag (refereegranskat)abstract Parallelized shared variable applications running on multi-core Network-on-Chips(NoCs) require efficient support for synchronization, since communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. In this paper, we propose a dedicated hardware module forsynchronization management. This module is called Synchronization Handler (SH), integrated with each processor-memory node on the multi-core NoCs. It uses two physical buffers to concurrently process synchronization requests issued by the local processor and remote processors via the on-chip network. One salient feature is that the two physical buffers are dynamically allocated to form multiple virtual buffers (a virtual buffer is related to a shared synchronization variable) so as to improve the buffer utilization and alleviate the head-of-line blocking. Synthesis results suggest that the SH can run over 900 MHz in 130nm technology with small area overhead. To justify the SH-enhanced multicore NoCs, we employ synthetic workloads to evaluate synchronizationcost and buffer utilization, and run synchronization-intensive applications to investigate speedup. The results show that our approach is viable.
9.	Chen, Xiaowen, et al. (författare) Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism 2010 Ingår i: Proceedings of the IEEE Norchip Conference. - 9781424489732 Konferensbidrag (refereegranskat)abstract In this paper, we propose a fast barrier synchronization mechanism, targetingNetwork-on-Chip based manycore architectures. Its salient feature is that, once thebarrier condition is reached, the "barrier release" acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrierreleasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chipsand implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing.
10.	Chen, Xiaowen, et al. (författare) Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property 2013 Ingår i: Computers & electrical engineering. - : Elsevier BV. - 0045-7906 .- 1879-0755. ; 39:2, s. 596-612 Tidskriftsartikel (refereegranskat)abstract In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 98

Avgränsa träffmängd

Typ av publikation: konferensbidrag (63); tidskriftsartikel (23); rapport (3); doktorsavhandling (3); bokkapitel (3); proceedings (redaktörskap) (1); visa fler...; annan publikation (1); forskningsöversikt (1); visa färre...

Typ av innehåll: refereegranskat (88); övrigt vetenskapligt/konstnärligt (10)

Författare/redaktör: Lu, Zhonghai (96)Ta bort avgränsningen; Jantsch, Axel (95)Ta bort avgränsningen; Chen, Xiaowen (13); Liu, Ming (12); Chen, Shuming (10); Jafari, Fahimeh (8); visa fler...; Zhou, Dian (7); Liu, Hengzhu (6); Zhang, Minxuan (6); Feng, Chaochao (6); Hu, Wenmin (6); Sander, Ingo (5); Naeem, Abdul (5); Wang, Qiang (4); Zheng, Lirong (4); Zheng, Li-Rong (4); Xu, Hao (4); Li, Jinwen (4); Liu, Shaoteng (4); Zhang, Yuang (4); Anagnostopoulos, Ira ... (3); Xydis, Sotirios (3); Bartzas, Alexandros (3); Soudris, Dimitrios (3); Lang, Johannes (3); Li, Lu (3); Hemani, Ahmed (2); Li, Li (2); Weerasekera, Roshan (2); Weldezion, Awet Yema ... (2); Chabloz, Jean-Michel (2); Candaele, Bernard (2); Aguirre, Sylvain (2); Sarlotte, Michel (2); Bekiaris, Dimitris (2); Vanmeerbeeck, Geert (2); Kreku, Jari (2); Tiensyrja, Kari (2); Ieromnimon, Fragkisk ... (2); Kritharidis, Dimitri ... (2); Wiefrink, Andreas (2); Vanthournout, Bart (2); Martin, Philippe (2); Chen, Xiaowen, 1982- (2); Pamunuwa, Dinesh (2); Gao, Minglun (2); Eslami Kiasari, Abba ... (2); Grange, Matt (2); Millberg, Mikael (2); Shaoteng, Liu, 1984- (2); visa färre...

Lärosäte: Kungliga Tekniska Högskolan (98)

Språk: Engelska (98)

Forskningsämne (UKÄ/SCB): Teknik (98)Ta bort avgränsningen; Naturvetenskap (5)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy